AITopics | closeness testing

Di erentially Private Testing of Identity and Closeness of Discrete Distributions

Neural Information Processing SystemsMar-15-2026, 18:30:43 GMT

We study the fundamental problems of identity testing (goodness of fit), and closeness testing (two sample test) of distributions over k elements, under di erential privacy. While the problems have a long history in statistics, finite sample bounds for these problems have only been established recently. In this work, we derive upper and lower bounds on the sample complexity of both the problems under (Á, ")-di erential privacy. We provide sample optimal algorithms for identity testing problem for all parameter ranges, and the first results for closeness testing. Our closeness testing bounds are optimal in the sparse regime where the number of samples is at most k.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.95)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

7de32147a4f1055bed9e4faf3485a84d-Paper.pdf

Neural Information Processing SystemsFeb-13-2026, 09:40:54 GMT

algorithm, erential privacy, proceedings, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

bc573864331a9e42e4511de6f678aa83-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 22:28:34 GMT

algorithm, closeness, teq, (16 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)

Add feedback

OptimalAlgorithmsforAugmentedTestingofDiscrete Distributions

Neural Information Processing SystemsFeb-8-2026, 08:36:05 GMT

We consider the problem of hypothesis testing for discrete distributions.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
(17 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Optimal Algorithms for Augmented Testing of Discrete Distributions

Neural Information Processing SystemsOct-9-2025, 19:10:29 GMT

We consider the problem of hypothesis testing for discrete distributions.

algorithm, closeness testing, probability, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(16 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management (0.92)

Add feedback

Differentially Private Equivalence Testing for Continuous Distributions and Applications

Neural Information Processing SystemsOct-9-2025, 17:27:06 GMT

We present the first algorithm for testing equivalence between two continuous distributions using differential privacy (DP).

algorithm, estcloseness, sample complexity, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > Czechia > Prague (0.04)
Asia > Middle East > Israel (0.04)
Africa > Sudan (0.04)

Genre: Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Testing Closeness With Unequal Sized Samples

Bhaswar Bhattacharya, Gregory Valiant

Neural Information Processing SystemsOct-2-2025, 07:23:15 GMT

We consider the problem of testing whether two unequal-sized samples were drawn from identical distributions, versus distributions that differ significantly.

algorithm, markov chain, statistic, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

bc573864331a9e42e4511de6f678aa83-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 03:04:02 GMT

artificial intelligence, machine learning, teq, (18 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)

Add feedback

Replicable Distribution Testing

Diakonikolas, Ilias, Gao, Jingyi, Kane, Daniel, Liu, Sihan, Ye, Christopher

arXiv.org Artificial IntelligenceJul-4-2025

We initiate a systematic investigation of distribution testing in the framework of algorithmic replicability. Specifically, given independent samples from a collection of probability distributions, the goal is to characterize the sample complexity of replicably testing natural properties of the underlying distributions. On the algorithmic front, we develop new replicable algorithms for testing closeness and independence of discrete distributions. On the lower bound front, we develop a new methodology for proving sample complexity lower bounds for replicable testing that may be of broader interest. As an application of our technique, we establish near-optimal sample complexity lower bounds for replicable uniformity testing -- answering an open question from prior work -- and closeness testing.

artificial intelligence, machine learning, probability, (18 more...)

arXiv.org Artificial Intelligence

2507.02814

Country: North America > United States > California (0.28)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Better Private Distribution Testing by Leveraging Unverified Auxiliary Data

Aliakbarpour, Maryam, Burudgunte, Arnav, Cannone, Clément, Rubinfeld, Ronitt

arXiv.org Artificial IntelligenceMar-18-2025

Accurately analyzing data while preserving individual privacy is a fundamental challenge in statistical inference. Since its formulation nearly two decades ago, Differential Privacy (DP) [DMNS06] has emerged as the leading framework for privacy-preserving data analysis, providing strong mathematical privacy guarantees and gaining adoption by major entities such as the U.S. Census Bureau, Amazon [Ama24], Google [EPK14], Microsoft [DKY17], and Apple [Dif17; TVVKFSD17]. Unfortunately, DP guarantees often come at the cost of increased data requirements or computational resources, which has limited the widespread adoption of differential privacy in spite of its theoretical appeal. To address this issue, a recent line of work has investigated whether access to even small amounts of additional public data could help mitigate this loss of performance. Promising results for various tasks have been shown, both experimentally [KST20; LLHR24; BZHZK24; DORKSF24] and theoretically [BKS22; BBCKS23]. The use of additional auxiliary information is very enticing, as such access is available in many real-world applications: for example, hospitals handling sensitive patient data might leverage public datasets, records from different periods or locations, or synthetic data generated by machine learning models to improve analysis. Similarly, medical or socio-econonomic studies focusing on a minority or protected group can leverage statistical data from the overall population. However, integrating public data introduces its own challenges, as it often lacks guarantees regarding its accuracy or relevance to private datasets.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2503.14709

Country: